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Abstract 



C\| . In this paper we propose a general framework to characterize and solve the 

stochastic optimization problems with multiple objectives underlying many real 
world learning applications. We first propose a projection based algorithm which 
attains an 0{T^^^^) convergence rate. Then, by leveraging on the theory of La- 
(—1 ' grangian in constrained optimization, we devise a novel primal-dual stochastic 

t/3 . approximation algorithm which attains the optimal convergence rate of 0(T^^/^) 

O ■ for general Lipschitz continuous objectives. 

■ 1 Introduction 

m 

Stochastic optimization algorithms such as stochastic gradient descent (SGD) [9, 1] and its online 

■ counterpart, online gradient descent (OGD) [13, 11], have been focus of intensive study in the last 
^ ■ few years. In the traditional setup of stochastic optimization, the learner is evaluated by a single loss 

function at each iteration. However, in many real world applications, the learner needs to consider 
several performance measures simultaneously [3] and we are not aware of any work addressing 
. multi-objective stochastic problems. 

In this work, we generalize online convex optimization (OCO) to the case of multiple objectives. 
In particular, at each iteration, the learner is asked to present a solution xt, which will be evalu- 
, ated by multiple loss functions /°(x), fl{x), . . . , /™(x). Since it is impossible to simultaneously 

■ minimize multiple loss functions and in order to avoid complications caused by handling more than 

■ one objective, we choose one function as the objective and try to bound other objectives by ap- 
propriate thresholds. Specifically, the goal of OCO with multiple objectives becomes to minimize 
X]t=i fti^t) and at the same time keep the other loss functions below a given threshold, i.e. 

1 ^ 

where Xi , . . . , xt are the solutions generated by the online learner and 7^ specifies the level of 
loss that is acceptable to the i-th loss function. We refer to the above problem as online convex 
optimization with multiple objectives. The proposed problem is closely related to the classical 
study of multiple objective optimization [10]. The main difference is that all the objectives (i.e., the 
loss functions) are changing over the iterations, making it a substantially more difficult problem. The 
proposed problem is also closely related to online optimization with side constraints [6, 4], where 
the constraint introduced is essentially the second objective in multiple objective optimization. The 
proposed problem generalizes online optimization with side constraints by allowing more than one 
constraints. 

Since the general setup (i.e., full adversarial setup) is challenging for online convex optimization 
even with two objectives [7], in this work, we consider a simple scenario where all the loss functions 
{fti')}"=i ^1'^ i-i-'^ samples from unknown distribution [12]. We also note that our goal is not to 
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find a sample from the Pareto optimal set ( i.e. the set of solutions that are not dominated in the 
Pareto sense in the decision space),_instead we are trying to satisfy all the objectives below a pre- 
specified threshold. We denote by /i(-) = Et[/j (•)], i ~ 0, 1, . . . , m the expected loss function of 
sampled function (•). To solve the problem, as is standard in stochastic optimization, we assume 
that we do not have direct access to the expected loss functions and only information available to 
the solver is through a noisy oracle which provides a stochastic realization of the expected loss 
function at each call. We assume that there exist a solution x strictly satisfying all the constraints, 
i.e. /i(x) < 7i, i e [to]. We denote by the optimal solution to the multiple objectives, i.e., 

X* = argmin {/o(x) : /^(x) < 7i, i = 1, . . . , to} . 

Our goal is to compute a solution xy after T trials that (i) obeys all the constraints, i.e. fil^x) < 
7i,i e [to] and (ii) minimizes the regret with respect to theoptimal solution X*, i.e. /o(xr) — /o(x*). 
For the convenience of discussion, we refer to /"(•) and /o(-) as the objective function, and to (•) 
and fi{-) as the constraint functions. 

Before discussing the algorithms, we first describe a few assumptions made in our analysis. We 
assume that the final solution x* lives in a ball B of radius R, i.e., B = |x e K.'* : |lx|| < i?}. 
We also make the standard assumption that all the loss functions are Lipschitz continuous, i.e., 

|/;(x) - /;(x')| < L||x - x'll for any X e S and x' e B. 

In this extended abstract, we only sketch the results, and omit many important details, which appear 
in the full version of our paper In section 2 we propose a projection based algorithm which reduces 
the problem into a standard optimization problem with changing solution space. Section 3 introduces 
our efficient primal-dual stochastic optimization algorithm achieving the optimal known bound. 



2 Warmup: a Projection based Algorithm 

The main challenge of the proposed problem is that the expected constraint functions fi{-) are not 
given. Instead, only a sampled constraint function is provided at each trial t. Our naive approach is 
to turn the multiple objective optimization problem into a constrained optimization problem as 

min^/oW (1) 
where domain JC is defined as /C = {x : /i(x) < 7i, « = 1, . . . to}. 

This approach circumvents the problem of optimizing multiple objective into the original online 
convex optimization with complex projections. Since the domain JC is unknown in (1), a naive 
approach is to estimate the expected constraint functions based on the sampled constraints received 
so far, and project the updated solution into the domain constructed by the estimated constraint 
functions. More specifically, at trial t, given the current solution X( and received loss functions 
/t (x), i = 0, 1, . . . , TO, we first estimate the expected constraint functions as 

i?(x) = i^/^Xx),*e [to] 
fe=i 

and then update the solution by X(+i = Ilyct (xf — 77V/4(x4)} where > = min^g^^ ll^— 

x||, and JCt is an approximate domain and is given by /Cf = {x : (x) < 7^, i = 1, . . . , to}. 

The problem with the above approach is that although it is feasible to satisfy all the constraints based 
on the true expected constraint functions, there is no guarantee that the approximate domain JCt is 
not empty. One way to address this issue is to estimate the expected constraint functions by burning 
the first hT trials, where b e (0, 1) is a constant that needs to be adjusted to obtain the optimal 
performance. Given the sampled constraint functions /{,..., /^j, received in the first hT trials, we 
compute the approximate domain JC' as 

" fo^II'^tW,* e M, /C' = |x : /^(x) < 7„^ = 1,...,to| 

t=i 



where 7^ = 7i + LR^J\2,/{hT)] \x\{m/5). It is straightforward to show that with a probability 1 — 5, 
for any x e /C, we have x e /C'. We note that for projection onto the estimated domain, we only 
consider only a special solution and therefore the negative results of uniform convergence [12] does 
not apply. Using the approximate domain JC' , for trial t G [bT + 1, T], we update the solution by 

xt+i = n/c'(xt - j]Wft{xt)). 
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There are however several drawbacks with this approach. First, by simple counting, it is not difficult 
to see that the overall violation of constraints, given by X)tLi fii'^t), is 0{bT + (1 — b)T / VbT): 
0{bT) comes from the first bT trials used to estimate the expected constraint functions, where the 
violation of each trial is about 0(1) and (1 — 5)T/V6T comes from the rest (1 — b)T trials where the 
violation is 0(1/ VtfT). By minimizing the overall violation, we choose b — 0{T^^/'^), leading to 
the overall violation of 0{T^^^). Using the same trick as in [5], we could obtain a solution with zero 
violation of constraints but with a regret bound of 0(r^/'^), leading to unsatisfied result. Second, 
this approach requires memorizing the constraint functions of the first bT trials. This is in contrast to 
the typical assumption of online learning where only the solution is memorized. Third, even though 
the difference between fi{-) and /i(x) is small, i.e., maxxee ~ — 0(l/\/6T), /;(•) 

could be non-convex, leading to inefficient computation when performing the projection. 

As indicated by the above analysis, the main limitation of the naive approach is that it requires a 
projection step. To address this limitation, we present an algorithm that does not require projection 
when updating the solution. We show that with a high probability, the solution found by the proposed 
algorithm will exactly satisfy the expected constraints and achieves a regret bound of 0(\/T). 



3 An Efficient Online Stochastic Primal Dual Algorithm 



The main jdea of the proposed algorithm is to design an appropriate objective that combines the loss 
function fo with {fi}"^i- To this end, we define the following objective function 

m 

£(x,A)-/o(x)+^A,(/Kx)-70 

i=l 

Note that the objective function consists of both the primal variables x and dual variables A = 
(Ai , . . . , Am). In the proposed algorithm, we will simultaneously update solutions for both x and A. 
By exploring convex-concave optimization theory [8], we will show that with a high probability, the 
solution of regret 0{VT) that exactly obey the constraints. 

As the first step, we consider a simple scenario where the obtained solution is allowed to violate 
the constraints. Algorithm 1 shows the detailed steps. It follows the same procedure as convex- 
concave optimization. Since at each iteration, we only observed a randomly sampled loss functions 
/j (•), i — 0,1, . . . ,m, the objective function given by 

rn 

/:i(x,A) = /°(x)+^A.(/;(x)-70 

1=1 

provides an unbiased estimate of £(x, A). Given the approximate objective £f (x, A), Algorithm 1 
tries to minimize the objective Ct{-, •) with respect to the primal variable x and maximize the objec- 
tive with respect to the dual variable A. In the following theorem, we show that under appropriate 
conditions, the solution x^ output by Algorithm 1 will have a convergence rate of 0{1/VT) for both 
the regret and the violation of the constraints. To facilitate the analysis, we rewrite the constrained 
optimization problem in ( 1 ) as 

m 

min max /o(x) + V Aj(/i(x) - 7^ (2) 

+ 1=1 

We denote by x* and A* = (A^, . . . , A*„)^ as the optimal solution to the above convex-concave 
optimization problem, i.e. 

m 

X* = arg min /q (x) + V A* (/^ (x) - 7^) (3) 
A* = argmax/o(x*) + Ai(/i(x*) - 7i) (4) 

AGRIP ,=1 

We define two quantities that are useful for bounding the gradients Vx>C(x, A) and Vx£{x, A): 

m / m \ *^ m 

= 5][A?]^ = 1 + 5: A° + max 5] /7(x) (5) 

i=l \ i=l I ^ i=l 
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Algorithm 1 Online Convex Optimization with Multiple Objectives 



INPUT: step size ?/, A" > 0, i G [m] and total time T 
xi = 0. 

fort = l,...,Tdo 
Submit the solution Xt 
Receive loss functions , i = 0, 1, . . . , m 
Compute the gradient V/j (xt), i = 0, 1, . . . , m 
Update the solution x and A by 

xt+i = IIb (xt - ?7Vx/^t(xt, At)) = He Xt - 7/ 



V/t°(xt)+5^AJV/t'(xt) 



K+i = n[o,A5,] (AJ + ,7VA./:t(xt,At)) =n[o,AO] (Aj+7/[/,Xxt)-7z]) 
8: end for 

9: Return xt = ELi ^t/T 



Theorem 1. Set Xq> \* + 9,i ^ [m], where 9 > is constant. Let xt be the solution obtained by 
Algorithm 1 obtained after T iterations. Then, with a probability 1 — {2m + 1)5, we have 



M^t) - /o(xo) < and /^(Xt) - 7« < ^ N 



where 



^{S) ^G^/R'^ + D^+2GiR + D)J2\n^. (6) 



We now develop an algorithm that allows the solution to exactly satisfy all the constraints. To this 
end, we define 7, = 7, — We will run Algorithm 1 but with 7^ replaced by 7^. The following 
theorem shows the property of the obtained solution. 

Theorem 2. Let xt be the solution obtained by Algorithm 1 with ji replaced by % and Aq = 
A* + 0, i G [m]. Then, with a probability 1 — (2m + 1)S, we have 

/o(xt) - /o(xo) < , /j(xt) < 7»,* e H 

where ii{S) is defined (6). 

In order to run Algorithm 1, we need to estimate the parameter A°, which requires estimating the 
upper bound for A* . To this end, we consider an alternative problem to the convex-concave opti- 
mization problem in (2), i.e. 

minmax/o(x) + A max (/^(x) - 7^) (7) 

xGB A>0 l<i<m 

Evidently x* is the optimal primal solution to (7). Let Aa be the optimal dual solution to the problem 
in (7). We have the following proposition that links X*,i e [m], the optimal dual solution to (2), 
with Aq, the optimal dual solution to (7). 

Proposition 1. Let Aa be the optimal dual solution to (7) and A* , i G [to] be the optimal solution to 
(2). We have A^ - E™ 1 

Given the result from Proposition 1, it is sufficiently to bound Aq. In order to bound Aa, we need to 
make certain assumption about fi,i e [m]. 

Assumption 1. We assume min ^7=1 Q^j^/j(x) > t, where t > is a constant and domain 
A,„ is defined as A„j — {a : Ei!li '^i — 

The following lemma bounds Aa by t. 

Lemma 1. Under Assumption 1, we have Aa < -7. 

Combining Proposition 1 with Lemma 1, we have, under Assumption 1, A* < ^, i = 1, . . . , to. 
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4 Conclusion 



In this paper we have addressed the problem of onHne stochastic optimization with muhiple objec- 
tives and presented an efficient primal-dual algorithm which attains the optimal convergence rate 
0(1/ VT) for all the objectives. 
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Appendix A. Proof of Theorem 1 



Using the standard analysis of convex-concave optimization, for any x e i3 and G [0, A°], i G 
[m], we have 

£(xt, A) - £(x, At) < (xt - x)^Vx£(xt, AO - (A* - X)VxC{^t, Xt) 
= {xt - x)^Vx/:i(xt, At) - (At - A)VAA(xt, At) 

+ (xt - x)^ (Vx>C(xt, At) - Vx/:t(xt, At)) - (At - A)^ (VA>C(xt, At) - VxCt{xt,\t)) 

< + + ? (||VxA(xt, At)|P + II V.£t(xt, At) 

T 

+ ^(xt - x)T (Vx£(xt, At) - Vx/:t(xt, At)) - (At - A)T (VA£(xt, At) - VA£t(xt, At)) 
t=i 

By adding all the inequalities together, we have 

T 

^£(xt,A)-£(x,At) 



< 



^||VxA(xt,At)f + ||VAA(xt,At)|| 



< 



||x-xi||^ + ||A-Ai||^ 7? 

27] 2 _ 

T 

+ ^(xt - x)^ (Vx£(xt, At) - Vx/:t(xt, At)) - (At - A)^ (VA£(xt, At) - VA^lxt, At)) 
t=i 

2?7 ^ 2 

T 

+ ^(xt - x)^ (Vx£(xt, At) - VxAIxt, At)) - (At - A)^ (VA£(xt, At) - VA£t(xt, At)) 
?2 I n2 



where the last step uses the Hoeffiding inequality for martingales [2]. For any fixed Xt £ [0, A°] , i G 
[m] and x £ B, with a probability 1 — ^, we have 

m m 
/o(xt) + ^ A,(/,(xt) - l^) - /o(x) - ^ A^(i;;(x) - 7,;) (8) 



By fixing x = x* and A = in (9), we have /i(x*) < E [to], and therefore, with a probability 
1 — ^, have 



/o(xt) < /o(x*) + Gxl ^ + 2G{R + D)J-ln- 



To bound the violation of constraints, for each i G [to], we set x = x*, A^ = A^, and Xj = X*,j ^ i 
in (9). We have 

m 

/o(xt) + X'^iMxT) - 7.) + E >^*AM^t) - Ij) - /o(x.) - E^t(/.(x*) - 7.) 

ni 

> /o(xt) + A°(/,(St) - 7.) + E ^^(M^t) - 1,) - /o(x.) - E A:(/.(x.) - 7,) 

> 0(/,(xt)-7z) 
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where the first inequality utilizes (4) and the second inequality utilizes (3). We thus have, with a 
probability 1 — 6, 

, [WTd^ 2G{R + D) p2~~[ 

We complete the proof by taking the union bound over all the random events. 

Appendix B. Proof of Theorem 2 

Following the proof of Theorem 1, with a probability 1 — (5, we have 

m m 

/o(Xt) + MM^t) - %) - /o(x) - ^ A^(/,(x) - 7,) (9) 

i=l i=l 



Using the definition of 7^, we have 

711 in 
U^t) +Y,K{mT) - l^) - /o(x) - ^ A^(/,(X) - 7,) (10) 



i— 1 

We complete the proof by following the same steps as Theorem 1 and the fact < Aq. 

Appendix C. Proof of Proposition 1 

We can rewrite (7) as 

m 

min iTiax /o(x) + VpiA(/j(x) - 7,) 

xGd A>0,pGA ^ — ^ 

- j=l 

where A is a simplex. By redefining Ai = p^A, we have the problem in (7) equivalent to (2) with 

Appendix D. Proof of Lemma 1 

Using the first order optimality condition, we have Aq ~ ^^Qg{l^-^\ where g(x) = maxi<i<,„ fi{:x.)- 
Since dg{x) G {J^^i cti^ • G Am}, we complete the proof using Assumption 1. 
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